Structural Compression Of Document Images With PDF/A
نویسندگان
چکیده
This paper describes a new compression algorithm of document images based on separating the text layer from the graphics one on the initial image and compression of each layer by the most suitable common algorithm. Then compressed layers are placed into PDF/A, a standardizated file format for long-term archiving of electronic documents. Using the individual separation algorithm for each type of document makes it possible to save the image to the best advantage. Moreover, the text layer can be processed by an OCR system and the recognized text can also be placed into the same PDF/A file for making it easy to perform cut and paste and text search operations.
منابع مشابه
Optimizing PDF output size of TEX documents
There are several tools for generating PDF output from a TEX document. By choosing the appropriate tools and configuring them properly, it is possible to reduce the PDF output size by a factor of 3 or even more, thus reducing document download times, hosting and archiving costs. We enumerate the most common tools, and show how to configure them to reduce the size of text, fonts, images and cros...
متن کامل2 Transform and Encoding Algorithm
The present contribution proposes a new remarkably eecient image compression algorithm for graylevel images based on dyadic wavelet transformation. In order to achieve perfect reconstruction, orthogonal decomposition is applied. Scalar quantization of wavelet coeecients is combined with run-length coding. Code word assignment is performed by semi-adaptive Huuman coding (determined by validity t...
متن کاملAn introduction to source coding
Format: ePub / PDF / Kindle This book provides a global understanding of source coding with an overview of practical coding schemes for speech, music and images. The first section covers background, theoretical material. It shows how source coding...
متن کاملEfficient document rendering with enhanced run length encoding
Document imaging and transmission systems (typically MFPs) require both effective and efficient image rendering methods that support standard data formats for a variety of document types, and allow for real time implementation. Since most conventional raster formats (e. g. TIFF, PDF, JPEG) are designed for use with either black and white text, or continuous-tone images, more specialized renderi...
متن کاملپژوهشی کیفی در تحلیل الگوی بهرهگیری خبرگان حوزهی سلامت از تصاویر پزشکی
Introduction: In health sector, image functions as a form of document that can convey a considerable amount of information. Employing this type of information can increase the effectiveness of the performance of medical experts. This study aimed to survey how health experts use medical images in their practice. Methods: This applied qualitative study was carried out in 1392 (2013). The study p...
متن کامل